gh-150942: Speed up csv.reader row building by omkar-334 · Pull Request #150995 · python/cpython

omkar-334 · 2026-06-06T01:53:26Z

Use _PyList_AppendTakeRef while collecting fields in csv.reader instead of
PyList_Append followed by Py_DECREF, removing an incref/decref pair per
parsed field.

Microbenchmarks

Release build, macOS

Benchmark	main	this PR	speedup
`csv_reader_2m_fields`	86.96 ms	79.23 ms	1.10x
`csv_reader_400k_fields_8col`	17.15 ms	16.26 ms	1.05x
`csv_reader_80k_fields`	3.17 ms	2.89 ms	1.10x
geomean			1.08x

Benchmark script

"""Micro-benchmark for csv.reader field-append hot path.

Generates an in-memory CSV with many fields per row, then times how long it
takes to iterate the reader fully. Repeats N trials, reports min/median/mean
so noise is visible. Writes per-trial timings to results/.
"""

from __future__ import annotations

import argparse
import csv
import io
import json
import statistics
import sys
import time
from pathlib import Path


def make_csv(rows: int, cols: int) -> str:
    buf = io.StringIO()
    w = csv.writer(buf)
    for r in range(rows):
        w.writerow([f"v{r}_{c}" for c in range(cols)])
    return buf.getvalue()


def time_reader(data: str) -> float:
    t0 = time.perf_counter_ns()
    for _ in csv.reader(io.StringIO(data)):
        pass
    return (time.perf_counter_ns() - t0) / 1e9


def main() -> int:
    ap = argparse.ArgumentParser()
    ap.add_argument("--rows", type=int, default=20_000)
    ap.add_argument("--cols", type=int, default=20)
    ap.add_argument("--trials", type=int, default=15)
    ap.add_argument("--warmup", type=int, default=3)
    ap.add_argument("--label", default="run")
    ap.add_argument("--outdir", default="results/gh-150942")
    args = ap.parse_args()

    data = make_csv(args.rows, args.cols)
    size_bytes = len(data.encode())
    field_count = args.rows * args.cols

    for _ in range(args.warmup):
        time_reader(data)

    trials = [time_reader(data) for _ in range(args.trials)]

    outdir = Path(args.outdir)
    outdir.mkdir(parents=True, exist_ok=True)
    summary = {
        "label": args.label,
        "python": sys.executable,
        "rows": args.rows,
        "cols": args.cols,
        "trials": args.trials,
        "warmup": args.warmup,
        "field_count": field_count,
        "input_bytes": size_bytes,
        "seconds": {
            "min": min(trials),
            "median": statistics.median(trials),
            "mean": statistics.fmean(trials),
            "stdev": statistics.pstdev(trials),
            "all": trials,
        },
        "fields_per_sec_at_min": field_count / min(trials),
    }
    out = outdir / f"bench_{args.label}.json"
    out.write_text(json.dumps(summary, indent=2))

    s = summary["seconds"]
    print(
        f"{args.label}: min={s['min']*1000:.2f}ms median={s['median']*1000:.2f}ms "
        f"mean={s['mean']*1000:.2f}ms stdev={s['stdev']*1000:.2f}ms "
        f"fields/s@min={summary['fields_per_sec_at_min']:.2e}"
    )
    print(f"wrote {out}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

eendebakpt · 2026-06-06T07:39:24Z

    }
-    if (PyList_Append(self->fields, field) < 0) {
-        Py_DECREF(field);
+    if (_PyList_AppendTakeRef((PyListObject *)self->fields, field) < 0) {


The list used here is self->fields. Can we guarantee that it is not mutated by concurrent threads?

when parse_save_field is running, self->fields is only reachable through self, and self is locked by `Reader_iternext's critical section. i dont think any other thread can mutate the list.

static PyObject * Reader_iternext(PyObject *op) { PyObject *result; Py_BEGIN_CRITICAL_SECTION(op); result = Reader_iternext_lock_held(op); Py_END_CRITICAL_SECTION(); return result; }

what do you think?

cc @eendebakpt , just a nudge

The object is locked and self->fields is not accessible via other paths so this is indeed safe.

If redesigning I would make probably make fields (and maybe some other fields) a local variable inside Reader_iternext_lock_held (only minor thing to take care of: the fields currently acts as a guard for re-entrant calls). This is out of scope for the PR though.

I would drop the new entry (or at least omit the implementation details).

@omkar-334 This needs to be reviewed by a core dev, might it might take some time.

pythongh-150942: Speed up csv.reader row building

7945e61

bedevere-app Bot added the awaiting review label Jun 6, 2026

bedevere-app Bot mentioned this pull request Jun 6, 2026

Improve performance by using reference stealing methods #150942

Open

omkar-334 added 3 commits June 6, 2026 07:23

Merge branch 'main' into pythongh-150942-csv-appendtakeref

1c00c00

add patch contributor name

ed09dac

Merge branch 'main' into pythongh-150942-csv-appendtakeref

b29bdae

eendebakpt reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-150942: Speed up csv.reader row building#150995

gh-150942: Speed up csv.reader row building#150995
omkar-334 wants to merge 4 commits into
python:mainfrom
omkar-334:gh-150942-csv-appendtakeref

omkar-334 commented Jun 6, 2026 •

edited

Loading

Uh oh!

eendebakpt Jun 6, 2026

Uh oh!

omkar-334 Jun 10, 2026

Uh oh!

omkar-334 Jun 10, 2026

Uh oh!

omkar-334 Jun 18, 2026

Uh oh!

eendebakpt Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

omkar-334 commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microbenchmarks

Uh oh!

eendebakpt Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

omkar-334 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

omkar-334 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

omkar-334 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omkar-334 commented Jun 6, 2026 •

edited

Loading